Counting Suffix Arrays and Strings
نویسندگان
چکیده
Suffix arrays are used in various application and research areas like data compression or computational biology. In this work, our goal is to characterize the combinatorial properties of suffix arrays and their enumeration. For fixed alphabet size and string length we count the number of strings sharing the same suffix array and the number of such suffix arrays. Our methods have applications to succinct suffix arrays and build the foundation for the efficient generation of appropriate test data sets for suffix array based algorithms. We also show that summing up the strings for all suffix arrays builds a particular instance for some summation identities of Eulerian numbers.
منابع مشابه
Translation of Multiword Expressions Using Parallel Suffix Arrays
Accurately translating multiword expressions is important to obtain good performance in machine translation, crosslanguage information retrieval, and other multilingual tasks in human language technology. Existing approaches to inducing translation equivalents of multiword units have focused on agglomerating individual words or on aligning words in a statistical machine translation system. We p...
متن کاملLightweight Parameterized Suffix Array Construction
We present a first algorithm for direct construction of parameterized suffix arrays and parameterized longest common prefix arrays for non-binary strings. Experimental results show that our algorithm is much faster than näıve methods.
متن کاملFaster Suffix Tree Construction with Missing
We consider suffix tree construction for situations with missing suffix links. Two examples of such situations are suffix trees for parameterized strings and suffix trees for two-dimensional arrays. These trees also have the property that the node degrees may be large. We add a new backpropagation component to McCreight’s algorithm and also give a high probability hashing scheme for large degre...
متن کاملComputing Longest Common Substrings Via Suffix Arrays
Given a set of N strings A = {α1, . . . , αN} of total length n over alphabet Σ one may ask to find, for a fixed integer K, 2 ≤ K ≤ N , the longest substring β that appears in at least K strings in A. It is known that this problem can be solved in O(n) time with the help of suffix trees. However, the resulting algorithm is rather complicated. Also, its running time and memory consumption may de...
متن کاملString Inference from the LCP Array
The suffix array, perhaps the most important data structure in modern string processing, is often augmented with the longest common prefix (LCP) array which stores the lengths of the LCPs for lexicographically adjacent suffixes of a string. Together the two arrays are roughly equivalent to the suffix tree with the LCP array representing the tree shape. In order to better understand the combinat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005